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DEVELOPMENT AND STANDARDIZATION OF THE AIR FORCE OFFICER 
QUALIFYING TEST FORM M 



I. INTRODUCTION 

In 1953, a selected group of paper-and -pencil 
subtests from the World War II aircrew classifica-^ 
tion battenes were combined with an academic 
aptitude test called the Aviation Cadet-Officer 
Candidate Qualifying Test. The result was a new 
operational instrument known as the Air Force 
Officer Qualifying Test (AFOQT). This test has 
remained the backbone of the Air Force officer 
selection and classification testing program down 
to the present. During its twenty years of use, thir- 
teen' different forms of the test were constructed, 
and from time to time other tests were derived 
from it to meet special needs. The entire history of 
this effort has been documented (Valentine & 
Creager, 1961, Miller & Valentme, 1964; Miller, 
l966;'Miller, 1968; Miller, 1970; Miller, 1972). 
Extensive technical data pertaining to the AFOQT 
have been summarized in a report on interpreta< 
tion and use of AFOQT scores (MiUer, 1969). 

The AFOQT is used to select candidates for 
most programs leading to a commission, with the 
Air Force Academy the only major exception. It is 
also used to select candidates for undergraduate 
pilot and navigator training, and to assist in 



assigning nonflying officers entering their initial 
tour of active duty. Under current production 
schedules, each form of the AFOQT serves these 
functions for the Air Force throughout a two-year 
cycle and is then retired. In accordance with this 
cycle, AFOQT Form M was scheduled for intro- 
duction in the AFROTC commissioning program 
on 1 September 1973, approximately coinciding 
with the beginning of a new academic year, and in 
all other programs on 1 January 1974. 

II. GKNKRALCHARACTKRISTICS 

AFOQT Form M was constructed according to 
the same plan as all its recent predecessors. It 
consists of 522 test items organized into thirteen 
subtests from which five composite scores are 
obtained.. These are^he Pilot, Navigator^Technical, 
Officer Quality, Verbal, and Quantitative 
composites. Only these composites are used in 
ways which affect the composition of the Air 
Force and the careers of individuals. Scoring by 
subtests is done for research.. The composition of 
the test is shown in Table 1 



Table I Content and Oigamzation of AFOQT Form M* 



Compotitcs 



B( iklet and Subtett 


No. of 
Items 


f>ilot 


Navigator- 
Technical 


officer 
Quality 


Quanti' 

Verbal tatlve 

• — - — — — - ^ 


Booklet 1 (AI'PT972) 
Quiiiititativc Aptitude 


60 




X 


X 


X 


Booklet 2 (AFPT 973) 
Verbal Aptitude 
Officer Biographical Inventory^ 


60 
96 






X 
X 


X 


Booklet 3 (Airr 974) 

Scale Reading^' 
Aerial Landmark s^-' 
(ieneral vScience 


48 
40 
24 




X 
X 
X 






Booklct4 (Al HT975) 
Mechanical Information 
Mechanical Principles 


24 

24 


X 
X 


X 
X 






Booklets (Am 976) 
Pilot Bioj?' iphical Inventory 
Aviation Information 
Visuali/Jtion of Maneuvers'- 
InMruinent Comprehension^ 
Stick and Rudder Orientation^ 


50 
24 
24 
24 
24 


X 
X 
X 
X 
X 








Total 


522 











^Associated adrninistrativc and scoring manuals arc AFPT 970 and 97 U respectively. Associated 
answer Oiccts ar> AFPT 967 and 968. Special manuals and answer forms are used in the AFROTC, pro 
tTutn, Scale Reading and Aerial Landmarks arc scored R-W/4, Visualiyation of Maneuvers and InUru 
mcnt Coniprcliensjon arc scored R - W/3. Other subtests are scored rights only. 

^Not administered to female applicants. 

^Speeded subtests. 
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Form M is published in five test booklets which 
arc accompanied by administrative, scoring* and 
interpretive manuals, a set of six hand scoring 
keys* and two special Digitck answer sheets. The 
answer sheets and interpretive manual are un- 
changed from the preceding form. The scoring 
manual contains three sets of tables for converting 
raw scores to percentiles. Selection of the proper 
set of tables is donp on the basis of the educational 
level of thejexaminee. The educational level in the 
various programs where the test is used can vary 
from college freshman to college graduate. The use 
of separate conversion tables for different levels is 
supported by two studies (Gregg, 1968; Tupes& 
Miller, 1969) which provide quantitative evalua- 
tion of the elevating effect of education on 
AFOQT scores. 

^ ill. ITtM SbllXTION 

Each form of the AFOQT is calculated to have 
the same difficulty as the preceding form. The 
selection of items is guided by the principle that 



the Item of median difficulty in each subtest 
should be answered correctly by 50 percent of the 
examinees for whom the test is appropriate, with 
the other items in the subtest having a consider- 
able range of difficulty about the median. The 
only exceptions are the two biographical subtests, 
for which the concept of difficulty has a some- 
what different meaning. Biographical items in a 
sense have no right or wrong answers, but 
responses are considered right or wrong according 
to whether they do or do not conform to the 
scoring key. 

The median difficulty and range of difficulty of 
items in Form M, except the biographical items, 
are shown in Table 2., Difficulties in the table arc 
expressed as percentages of examinees who answer 
the items correctly., Thus, the higher values 
represent the easier items. The desired median 
difficulty is closely approximated m each subtest,^ 
but the range of difficulty is somewhat narrow in 
the spatial subtests. A narrow range for spatial 
items has characterized previous forms of the 
AFOQT. 



Table 2. Item Difficulty Levels and Internal Consistency of AFCQT Form M* 



Difficulty Levtl Inttrnai Consistency 



Subtest 


Range 


Median 


Range 


Median 


Quantitative Aptitude 


.12-.87 


.52 


.2r-.85 


.50 


Verbal Aptitude 


14-.85 


.54 


.26'.84 


.46 


Scale Reading 


.20-.81 


.57 


M'.ll 


.44 


Aerial Landmarks 


.26-.82 


.52 


.27-.81 


.53 


General Science 


.13. .92 


52 


.12^.78 


J8 


Mechanical Information 


.18-.89 


.50 


.28-.79 


.53 


Mechanical Principles 


.22-.89 


54* 


.10-.60 


.37 


Aviation information 


.27-.82 


.52 


.24- 66 


.42 


Visualization of Maneuvers 


.28-.85 


.69 ' 


24..68 


38 


Instrument Comprehension 


.32-.85 


.62 


.27-.69 


.46 


Stick and Rudder Orientation 


.51 -.84 


.72 


.24-.66 


51 



*Based on samples of 400 or more student officers. 
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Table 2 also presents internal consistency data 
for Form M. Internal consistency refers to the 
tjrrelation between the correct response to on 
Item and the total score of the subtest of whicli 
the item is a part.. Again the biographical subtests 
are a special case. Low internal consistency is to be 
expected of them. In other subtests it is desired 
that the mternal consistency be higji, but it is not 
possible to have uniformly high internal con- 
sistencyiin items having the desired distribution of 
difficulty.: The range and median of the internal 
consistency distributions for Form M are similar to 
those for otiier forms of the AFOQT.: No items 
having positive internal consistency coefficients 
for any incorrect response were included in the 
test.. Some anchor items which appeared in 
previous forms were included . 



IV. RELIABILITY, INTKR- 
CORRELATIONS, AND VALIDITY 

Tliougli vanous forms of the AFOQT are used 
consecutively, they have in effect the properties of 
alternate forms. It is therefore assutned initially 
that such technical data as reliabflity, validity, and 
intercorrelations of composites for a new form are 
similar to the corresponding data fir preceding 
forms. This assumption can not be( tested ade- 
quately in the standardization sample because the 
strategy does not require that this sample be 
representative of the population on which %c test 
IS to be used. Moreover, validity studies usually 
include data from more tlian one form. 

Reliability and inlercorrelation data for the 
composites are presented in Tables 3 and 4. These 
are based on previous forms but are considered to 
be estimates for Form M. The reliability data are 
determined from the formula for the reliability of 
a composite (AVherry & Gaylord, 1943), which in 
turn IS based on test retcst or Kuder-Richardson 
Formula 20 data for the subtests. The biographical 
subtests areoniitted. 



Tabic 1 Estimated Reliability 
of Composites, AFOQT Form M 



Table 4 Estimated Inter- 
correlation of Composites, 
AFOQT Form M 



Compotitfl ~^ 


Reliability 


Pilot 


.91 


NavigatorTechnical 


.95 


Officer Quahty 


.94 


Verbal 


.89 


Quantitative 


,93 







Navigator- 


Officer 




composite 


Pilot 


Technical 


Quality 


Verbal 


Navipatory-Technicjl 


70 








Oftla»r Quality 


,50 


79 






Vcrbj! 


.43 


.57 


.80 




Quantitative 


.50 


R7 


.85 


55 



A convenient summary of validity data is 
contained in the AFOQT Manual for Inter- 
pretation, AFPT 901, and in the technical report 
on interpretation and use of AFOQT scores 
(MiHer, 1969). The most recent validities for 
performance in flying training are to be found in 
the development report for Form L (Miller, 1972). 



V. STANDARDIZATION 

The AFOQT has traditionally been standar-^ 
dized on an Air Force Academy candidate group.. 
After 1960, Academy candidates were no longer 
available for this purpose, but a method was 
devised for indirectly relating a new AFOC^ form 
to a prior Academy candidate group. The cific 
group was made up of 5,105 candidates lor the 
class of 1964. The indirect method has been 
described in general (Dailey, Shaycoft, & 
Orr, 1962), and in its specific application to the 
AFOQT (Miller & Valentine, 1964). Briefly, the 
method consists of equipercentile conversions 
from AFOQT Form G, which was administered to 
Academy candidates, through composites of tests 
from the Project TALENT battery to the new 
fonn of the AFOQT. The relationship between the 
TALENT composites and the new form is 
determined on samples of basic airmen stratified 
on the Armed Forces (Qualification Test (AFQT) 
by deciles in the percentile range from 21 to 100. 
The composition of the TALENT composites is 
given in Table 5. The TALENT Academic 
composite is equivalent to the AFOQT Officer 
Quality composite with its biographical inventory 
omitted. Correlations of the TALENT composites 
with the corresponding AFOQT composites range 
from .80 to .88. 

Standardization of all five AFOQT composites 
should ideally make use of the same stratified 
basic airman sample. In practice, this requires an 
unreasonable amount of testing time per 
examinee. For this reason, three stratified samples 
were used. One sample was for standardization of 
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Table 5. Composition of TALENT Composites Corresponding to 
AFOQT Form M Composites" 



Wtlght In TALENT Composite 





TALENT Ttft 


No. of 
Itf ms 


Pilot 


Nav* 
Ttch 


Aca- 
dtmlc 


Vtrbal 


Qu«nti 
tativc 


102 


Vocabulaiy (Information) 


1 1 












1T)3 


Literature (Information) 


lA 












1 UO 








3 


2 


-> 


2 


110 


Aeronautics and Space (Information) 


10 


3 




2 


3 




111 


Electricity and Elcctromcs(Irformation) 


20 


I 


2 








112 


Mechanics (Information) 


19 


3 










250 


Reading Comprehension 


48 






1 


1 




270 


Mechanical Reasoning 


20 


3 


3 








281 


Visualization in Two Dimensions 


24 


i 










282 


Visualization in Three Dimensions 


16 


2 


3 








312 


Mathematics II. Introductory 


24 




3 


2 




2 


333 


Mathematics HI. Advanced . 


14 


2 




3 




.3 




Total 1 


263 













*Data assembled from DaUty» shaycoft. & Orr {1962, Table 9). 



the Pilot composite, one for the Navigator*^ 
Technical composite, and one for the Officer 
Quality composite and its constituent Verbal and 
Quantitative composites. The three samples were 
compared, two at a time, in terms of their AFQT 
score distributions, and each pair was tested for 
significant differences by chi-square. The results 
arc shown in Table 6. Differences in the samples 
arc very small, and none are statistically 
significant. \ 

Data obtainecl t>om monitoring the operational 
use of the AFOOT suggest that the test has 



unexpectedly become too difficult. In all programs 
combined, from 56 to 68 percent of the examinees 
scored at or above the 25th percentile on the three 
composites in Form L for which minimum 
qualifying scoresj are established. On the Verbal 
and Quantitative' composites, where there are no 
minimum qualifyling scores, the percentages are 62 
and 45, respectively, in all programs other than 
AFROTC. In every case, the theoretically expected 
percentage is 75.. The most severely affected 
composites are Hiose containing the Quantitative 
Aptitude subtest. 



Tabk 6. Homogeneity of AFOQT Form M Normative 
Samples with Respect to AFQT Deciles 



Samples Compared 


C hi -Square 


df 


p 


Pilot and Navigator-Techniciil 


0.172 


7 


>.99 


Pilot and Officer Quality 


0.023 


7 


>.99 


Navigator Technical and Officer Quality 


0 079 


7 


>.99 
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Pnor to AFOQT-64, it was considcicd desirable 
to correct AFOQT conversion tables to com- 
pensate for the self-selection process among 
Academy candidates which reunited in extremely 
higli academic aptitude scores, especially in the 
quantitative domain. This correction is described 
elsewhere (Valentine & Creager, 1961). It involved 
referring score distributions to an earlier and less 
highly self-selected Academy candidate group, and 
It principally affected the composites containing 
the Quantitativci Aptitude subtest. Beginning witii 
AFOQT-64, the correction was dropped because it 
tended to make for a too easy test.. 

The recent operational data suggest that it is 
appropriate to reinstate this correction. This was 

/ 



accomplished readily because data are available 
jetating both corrected and uncorrected AFOQT 
Form G distribuUons to the TALENT composites. 
An abridgment of the Form M conversion tables 
for examinees having less than two years of college 
IS presented in Table 7, together with the tables as 
- - 4Jicy w.ould appear without the correction. Tlie 
Pilot and Verbal composites are almost 
unaffectcfd. The effect on the other composites 
should produce an increase in qualification rates 
without changing the AFOQT normative base in 
any fundamental way. Similar re»uU^ can be- 
expected at other educational levels where the 
AFOQT is used 



Table 7. Convei^ion Tables for AFOQT Form M Examinees 
' ^ with Less than Two Years of College 

\ 

V Corrtcttd Uncorr«ct«d 



Perctntile 


Pilot 


Tech 


Malt 

Officer 
Quality 


Verbal 


Quant 


niot 


' Nav- 
Tech 


Male 
Officer 
Quality 


Verbal 


Quar^t 


9095 


136-204 


109-220 


118-200 


48-60 


35-60 / 


135-204 


115-220 


1 20-200 


49-60 


41-60 


80-85 


123^135 


97-108 


111-117 


4247 


31-34 


125-134 


104-114 


114-119 


4548 


3840 


70-75 


114-122 


91-96 


107-110 


3941 


28-30 


115-124 


94-103 


1 10-113 


4144 


34-37 


60-65 


106-113 


85-90 


103-106 


37-38 


26-27 


107-114 


87-93 


106 109 


3840 


32-33 


50-55 


98-105 


80-84 


99-102 


35-36 


24-25 


99-106 


82-86 


103-105 


36-37 


30-31 


4045 


92-97 


75-79 


95-98 


33-34 


22-23 


92-98 


78-81 


100-102 


34-35 


28-29 


30-35 


85-91 


69-74 


90-94 


30-32 


20-21 


85-91 


72-77 


94-99 


32-33 


26-27 


20-25 


77-84 


63-68 


84-89 


26-29 


18-19 


77-84 


65-71 


88-93 


28-31 


24-25 


10-15 


68-76 


57-62 


78-83 


22 25 


14-17 


66-76 


55-64 


80-87 


23-27 


20-23 


01-05 


0-67 


0-56 


0-77 


0-21 


0-13 


0-65 


0-54 


0-79 


0-22 


0-19 



VI. SCORK DISTRIBUTIONS 

The stratified samples of basic airmen used in 
standardizing Form M are compared in Table 8 
with similarly stratified samples of basic airmen^ 
on which the relationships between TALENT and 
AFOQT composites were originally determined.. 
The comparison is in terms of cumulative 
frequency distributions of TALENT composite 
scores. No conection is incorporated into tins 
table, so it may be compared directly with cor- 
responding tables for all AFOQT formsVter Form 
C. The table indicates that the TALENT 



composites are somewhat less difficult for the 
Form M sample th'dn for the original sample to 
which Form G was adniinistered. This suggests the 
possibility that the recent increase in AFOQT 
disqualification rates may be related to changes in 
the officer applicant populations relation to the 
enlisted population. 

Raw score means and standard deviations of 
Form M composites have been computed only for 
the stratified basic airman samples. These are 
reported in Table 9., Estimated means and standard 
deviations for a 12th grade male sample and the 
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Table <V. Cumulative Percentage Dfstributions for TALENT • 
Composites in Original Air Force TALENT Sample and AFOQT Form M 
Normative Samples 

TALENT Composite 



Original AF 
TALENT Sample* 



Form M 
Normative Samples'' 



AFOQT 




Nav 


Aca* 




Nav- 


Aca* 




Pilot 


Tech 


^ deniic 


Pilot 






% 


08 


0.6 


0.1 


0 4 


0.3 


0.1 




1.6 


0.*) 


04 


1 .2 


0.5 


0.2 


85 


2.5 


1.3 


0.5 


1 .9 


0.7 


0.7 


80 




1.6 


0.6 


3 0 


1 


' i 


7*5 


4.4 


2 0 


09 


4.3 • 






70 


6.1 


2 7 


1.3 


4.8 




J. 5 


65 


74 


3.2 


1.7 


6.5 


3.3 


3 0 


60 . 


9.2 


3.8 


2.2 


8.1 


4.*: 


3.6 


-55 


10.4 


4.5 


2.7 


• 10.5 


54 


5.0 


50 


11.6 


5.3 


3.4 


.12.7 


6.3 


5.5 


45 


13.3 


6.2 


3.8 


16.0 


7.5 


63 


40 


. 15 5 


7.3 


4.5 


19,0 


9.1 


7.8 




17.7 


8.3 


5.4 


214 


12.8 


9 8 


30 


21.2 


10.2 


6.6 


26.0 


14.1 


11.9 


25 


25.1 


12.4 


8.2 


30.9 


17.8 


15.1 


20 


29.3 


15.2 


10.5 


35.3 


22.3 


18.8 


15 


34.8 


. 18.6 


13.6 


42 2 


27.9 


23.6 


10 


42.5 


23.5 


184 


50.8 


35 6 


30.2 


, 05 


56.6 


34.4 ; 


29.5 


64.7 


494 


44 5 


01 


• 100.0 


' 100.0 


100 0 


100.0 


100.0 


100.0 



''N = 2.489. 

^Ns range from 935 to 937. 

Table V. Raw Score Means and Standard Deviations of AFOQT Form M 
Composites for Five Groups 



AFOQT 
Composite 



Stratified 

Basic 
Airmen^ 



12 th 
Grade 
Males^ 



Examinees 

with 
Less than 
2 Years 
College^ 



Examinees 

with 
2 or More 

Years 
College^ 



Examinees 
who are 
College 

Graduates^ 



Mean 


so 


Mean 


so 


Mean 


so 


Mean 


so 


Mean 


so 


69.8 


25.0 


68 6 


29.0 


96.5 


22.7 


100.0 


24.2 


103 0 




50.2 


19.3 


49.7 


23 7 


79.0 


15.8 


83.0 


14.8 


88.0 


12.8 


74 0 


15.? 


72.7 


18.8 


98.0 


11.5 


104 5 


100 


110.5 


7.0 


22.8 


9.7 


22.1 


13.1 


34.5 


6.0 


37.5 


5.8 


40 5 


5.8 


16.1 


74 


18.2 


74 


23.5 


6 0 


25.5 


5.8 


28.5 


5.8 



Pilot 

Navigatoi'Tcchracal 
"Wicer Quality 
Vc)hul 
Quantitative 



Str.ttjficd oij AFQTdcc'lc in range of 21st through lOOtli {Hrcciitilc. Nsvarv (roiu 93^ to 037 for the various 
composites. 

Oata eslimand frotu unnubhshcd tables bv i\nh s ct ah. 1962. ha*** d nn 4 \h runt subsanipk ol 12th gradt inales 
in originaJ Project TALENT study. N = 2,403. 

*'Data C'.iiinatcd from AFCX^^T Vonn M tonvrrsiou tables. 
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three e(h«?alipnal levels in the operational 
population are also shown. These estimates are 
based on conversioa -tables and are somewhat 
inexact.' However, with a few exceptions, they 
conform to the expectaMon that means will 
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